CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

نویسندگان

  • Anabela Barreiro
  • Francisco Raposo
  • Tiago Luís
چکیده

Currently available alignment tools and procedures for marking-up alignments overlook non-contiguous multiword units for being too complex within the bounds of the proposed alignment methodologies. This paper presents the CLUE-Aligner (Cross-Language Unit Elicitation Aligner), a web alignment tool designed for manual annotation of pairs of paraphrastic and translation units, representing both contiguous and non-contiguous multiwords and phrasal expressions found in monolingual or bilingual parallel sentences. Non-contiguous block alignments are necessary to express alignments between multiwords or phrases, which contain insertions, i.e., words that are not part of the multiword unit or phrase. CLUE-Aligner also allows the alignment of smaller individual or multiword units inside non-contiguous multiword units. The interactive web application was developed under the scope of the eSPERTo project, which aims to build a linguistically enhanced paraphrasing system. However, a tool for manual annotation of alignment and for visualization of automatic phrase alignment can prove useful in human and machine translation evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploitation of an Arabic Language Resource for Machine Translation Evaluation: using Buckwalter-based Lookup Tool to Augment CMU Alignment Algorithm

Voss et al. (2006) analyzed newswire translations of three DARPA GALE Arabic-English MT systems at the segment level in terms of subjective judgment scores, automated metric scores, and correlations among these different score types. At this level of granularity, the correlations are weak. In this paper, we begin to reconcile the subjective and automated scores that underlie these correlations ...

متن کامل

Machine Translation of Non-Contiguous Multiword Units

Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an...

متن کامل

Exploitation of an Arabic Language Resource for MT Evaluation: Using Buckwalter-based Lookup Tool to Augment CMU Alignment Algorithm

Voss et al. (2006) analyzed newswire translations of three DARPA GALE Arabic-English MT systems at the segment level in terms of subjective judgment scores, automated metric scores, and correlations among these different score types. At this level of granularity, the correlations are weak. In this paper, we begin to reconcile the subjective and automated scores that underlie these correlations ...

متن کامل

Corpus Aligner (CorAl) Evaluation on English-Croatian Parallel Corpora

An increasing demand for new language resources of recent EU members and accessing countries has in turn initiated the development of different language tools and resources, such as alignment tools and corresponding translation memories for new languages pairs. The primary goal of this paper is to provide a description of a free sentence alignment tool CorAl (Corpus Aligner), developed at the F...

متن کامل

Using external sources of bilingual information for on-the-fly word alignment

In this paper we present a new and simple language-independent method for word-alignment based on the use of external sources of bilingual information such as machine translation systems. We show that the few parameters of the aligner can be trained on a very small corpus, which leads to results comparable to those obtained by the stateof-the-art tool GIZA++ in terms of precision. Regarding oth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016